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Data analysis for the proposed Laser Interferometer Space Antenna (LISA) will be complicated by 
the huge number of sources in the LISA band. In the frequency band ~ 10^'' — 2 x 10""^ Hz, galactic 
white dwarf binaries (GWDBs) are sufficiently dense in frequency space that it will be impossible 
to resolve most of them, and "confusion noise" from the unresolved Galactic binaries will dominate 
over instrumental noise in determining LISA's sensitivity to other sources in that band. Confusion 
noise from unresolved extreme-mass-ratio inspirals (EMRIs) could also contribute significantly to 
LISA'S total noise curve. To date, estimates of the effect of LISA's confusion noise on matched-filter 
searches and their detection thresholds have generally approximated the noise as Gaussian, based on 
the Central Limit Theorem. However in matched-filter searches, the appropriate detection threshold 
for a given class of signals may be located rather far out on the tail of the signal-to-noise probability 
distribution, where a priori it is unclear whether the Gaussian approximation is reliable. Using the 
Edgeworth expansion and the theory of large deviations, we investigate the probability distribution 
of the usual matched-filter detection statistic, far out on the tail of the distribution. We apply these 
tools to four somewhat idealized versions of LISA data searches: searches for EMRI signals buried 
in GWDB confusion noise, and searches for massive black hole binary (MBHB) signals buried in 
i) GWDB noise, ii) EMRI noise, and iii) a sum of EMRI noise and Gaussian noise. Assuming 
reasonable short-distance cut-offs in the populations of confusion sources (since the very closest and 
hence strongest sources will be individually resolvable), modifications to the appropriate detection 
threshold, due to the non-Gaussianity of the confusion noise, turn out to be quite small for realistic 
cases. The smallness of the correction is partly due to the fact that these three types of sources 
evolve on quite different timescales, so no single background source closely resembles any search 
template. We also briefly discuss other types of LISA searches where the non-Gaussianity of LISA's 
confusion backgrounds could perhaps have a much greater impact on search reliability and efficacy. 

I. INTRODUCTION 

Data analysis for the proposed Laser Interferometer Space Antenna (LISA) will be complicated by the huge number 
of sources in the LISA band. For example, while LISA is expected to detect of order 10'* individual compact binaries 
(especially white dwarf- white dwarf binaries) in our Galaxy, in the frequency band ~ 10""* — 2 x 10"'^ Hz such binaries 
are sufficiently dense in frequency space that it will be impossible to resolve most of them. The "confusion noise" 
from all the unresolved Galactic binaries will dominate over instrumental noise in determining LISA's sensitivity 
to other sources in that band. Extreme-mass-ratio inspirals (EMRIs) are another very important category of LISA 
sources. EMRIs are inspirals of stellar-mass compact objects (white dwarfs, neutron stars, or black holes) into massive 
(~ IQ^Mq) black holes (MBHs) in galactic nuclei. Because of their extremely small mass ratio, EMRI sources remain in 
the LISA band for timescales of order years. While LISA can do a great deal of interesting science with individually 
detected EMRIs, the EMRIs that are too faint to be resolved also constitute a confusion background, partially 
masking other sources. Barack and Cutler Q (hereinafter BC2) estimated the spectral density of confusion noise 
from unresolved EMRIs and found that it becomes comparable to that of LISA's instrumental noise or WD confusion 
noise only if EMRI event rates turn out to be at the high end of the estimated range. BC2 therefore concluded that 
LISA'S EMRI confusion background would be rather benign: either the EMRI rates are low-to-medium, in which case 
non-EMRI noise sources dominate the total noise, or the EMRI rates are high, in which case noise from unresolvable 
EMRIs could dominate, but the EMRI detection rate is also higher (which would more than compensate, from a 
scientific standpoint). 

However there is a potential caveat to BC2's treatment of EMRI confusion noise (as well as to many discussions 
of the white dwarf confusion noise) related to the Gaussianity of that noise. Because the number of undetected 
GWDBs or EMRIs wiU be large (- lO"^ - 10® for GWDBs and - 10^ - 10^ for EMRIs), the confusion background 
has generally been treated as approximately Gaussian, based on an appeal to the Central Limit Theorem. However 
in matched-filter searches, the detection threshold for a given class of signals may be located rather far out on the tail 
of the signal-to-noise probability distribution. For example, in searching for EMRISithe vast number of independent 
EMRI signals that can be searched for necessitates a detection threshold of ^ 14 cr Q, assuming Gaussian statistics; 
similarly we estimate that searches for MBHBs will require a signal-to-noise detection threshold of ~ 7cr, to keep 
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false alarms at an acceptable level. Then naturally one must confront the question: How much does the tail of the 
distribution of the usual detection statistic deviate from Gaussian in the range ~ 7 — 14(t? In other words, how 
often does the confusion background manage to mimic the signals searched for, at the level of the usual detection 
threshold? Or put yet another way: how much higher must one set the detection threshold to compensate for the 
non-Gaussianity of the tail of the distribution? We analyse these questions by harnessing two tools from statistics, 
namely the Edgeworth expansion and the theory of large deviations, and applying them to model problems that are 
somewhat idealized versions of the cases that will arise in actual LISA data analysis. 

In this paper we will be concerned with three types of LISA sources, all of which are binaries: GWDBs, MBHBs, 
and EMRIs. To make the calculations below analytically tractable, we shall assume that the binary orbits are quasi- 
circular (i.e., circular except for a slow inspiral due to gravitational radiation reaction), and we shall approximate 
each gravitational waveform by its lowest-order piece in a post-Newtonian expansion. Also, while LISA should return 
two independent science data channels (and a third at high frequency), for simplicity we shall treat the output as a 
single channel. Other simplifications and approximations are discussed below. 

To avoid confusion, we should emphasize that in this paper we are mainly interested in detections near the threshold 
SNR. Now, the strongest MBHB signals detected by LISA (perhaps from ^ IO^Mq MBHs merging at redshift z < 1) 
will likely have (matched-filtering) SNRs ^ 10^; however those are not the MBHB signals that interest us here. 
Instead, when we discuss searches for MBHB signals, we are mainly interested in the most distant resolvable ones; 
e.g., mergers of ~ W^Mq MBHs at z ~ 20. Likewise when we discuss searches for EMRI signals, our interest is in 
the weakest resolvable ones. Of course, the overall detection rates will likely be dominated by the weakest detectable 
signals. 

This paper is structured as follows. We first discuss, in section [Hi the role of confusion noise in a matched filter 
search and show how it reduces to a statistical problem involving sums of independent identically distributed random 
variables. We then describe in some detail the two statistical techniques we apply in this paper, namely the Edgeworth 
expansion and the theory of large deviations. Next, in section Hill we describe in detail our toy models for the GWDB 
and EMRI confusion noise respectively. In particular we discuss our model waveforms, which are simply Newtonian 
circular-orbit chirps, and our choices for binary parameter distributions. Finally in section HVl we use confusion noise 
models of section Hill and apply the tools described in section |TT1 to four model searches: searches for EMRIs signals 
imbedded in GWDB confusion noise and searches for MBHBs imbedded in i) GWDB noise, ii) EMRI noise, and 
iii) the sum of EMRI noise and Gaussian noise (instrumental plus GWDB), respectively. In each case we obtain the 
probability distributions for the usual detection statistic and assess the impact of the non-Gaussianity of the confusion 
noise on the appropriate detection threshold. Our conclusions are summarized in section V. In an appendix we present 
a heuristic derivation of the central result in large-deviations theory (Chernoff's formula), describe its relation to the 
Edgeworth expansion, and apply it to a a simple, illustrative case-the binomial distribution. 

Throughout this paper, we use geometrical units in which G — c = 1. Therefore everything can be measured in our 
fundamental unit of seconds. For familiarity, we sometimes express quantities in terms of yr, Mpc, or Mq, which are 
related to our fundamental unit by 1 yr = 3.1556 x 10^ s, 1 Mpc 1.029 x 10^'* s, and IMq = 4.926 x 10"*^ s. 

II. STATISTICAL FOUNDATIONS 

A. Confusion noise in matched-filter searches 

As an introduction to the general problem of searching for gravitational-wave (GW) signals that may be buried in 
confusion noise, consider a LISA data set s{t) that is dominated by instrumental noise plus unresolved background 
signals, but which may also contain some resolvable signal proportional to h{t)] i.e., 

N 

s{t) ^n{t) +^h^{t) + ph{t) (2.1) 

i=l 

where n{t) represents (Gaussian) instrumental noise, the sum over N sources represents the confusion background, 
h{t) is the sought-for signal [normahzed to {h\h) — 1, where the inner product ( | ) is defined below], and p represents 
the overall strength of the sought-for signal. If h{t) is simply not present in the data, then p — 0. For example, the 
background signals hi{t) could be from GWDBs ^, while h{t) is the gravitational wave signal from some MBHB. In a 



Since the WD binaries are located much closer to us than the sought-for MBHB, their summed signal is sometimes called "foreground" 
instead of "background". However we shall refer to all confusion noise populations simply as "backgrounds". 
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matched-filter search for h{t) in this data set, one basically just computes the inner product {s\h): 



N 

is\h) = {n\h) + J2^i + P' (2-2) 

i 

where {n\h) is a Gaussian random variable, and where each a;,- = {hi\h) is a random variable drawn from some 
probability distribution function (PDF) p{x). There must be some threshold value pth, such that when {s\h) > pth 
one can claim a detection with very high confidence (say, > 99%). But what is this threshold value? To compute pth, 
we need to know the probability distribution of the sum 

JV 

X^^x,. (2.3) 

i 

Most of the work in this paper will be spent in estimating the probability distribution function for X, P^{X), given 
its parent distribution p{x). We will be particulary concerned with the behavior of Pn{X) at large X - out on the 
"high-cj" tail. In the next subsection we describe two tools from statistics that are quite useful in this context. 



B. The Central Limit Theorem and Beyond 

Let p{x) be some normalized PDF. The 5*'' moment of p{x) is defined to be 

fiq = E[x'i] = j x''p{x)dx. (2.4) 

We shall assume for convenience that /iti (the mean value of x) vanishes, since it automatically does so in all applications 
in this paper. We shall also assume that the second moment /X2 (the variance of x) exists, and define ax = \/At2- Let 
Z be the average value of N samples from this distribution: 

Z^^fl^i (2-5) 

i=l 

Then the Central Limit Theorem basically states that in the limit of large A^, the PDF for Z, Pm{Z), approaches a 
Gaussian with variance P2IN . Defining the re-scaled variable Y = ^^Z, we have 



Y21 

as A?" — > 00. 

While the Central Limit Theorem states that Pn{Y) converges to a Gaussian for large N, for this paper it is crucial 
to realize that the convergence of the ToMo Piv(y)/[(27r)-i/2e-'^ /^j ^^^^^ 

can be remarkably slow at large values 

of Y. This is particularly true if some higher moments of p{x) diverge, as happens, e.g., if p{x) has only power-law 
decrease at large x. 

To quote standard theorems on the convergence of Pn{Y) to a Gaussian, we need a few more definitions. Define 
Fn{Y) to be the cumulative distribution function (CDF) of Pn{Y), 

Fn{Y)= f PN{Y)dY, (2.7) 

J — 00 

and let Fn{Y) be the complementary function to F]sf{Y): 

PN{Y)dY =1-Fn{Y). (2.8) 

Also define $(F) to be the CDF of a Gaussian, 

^(Y)^-^j\-^'/'dY. (2.9) 
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and let 



(2.10) 



Of course, = (1/2) erfc(y/\/2), where "erfc" is the complementary error function. 

Now let us further assume that the absolute third moment ps = > of the parent distribution p(x) exists 

and is finite. For this case, a well-known result on the convergence of (|2.6p is the Berry-Esseen Theorem, which states 
that for aU F and A^, 



sup 

Y 



Fn{Y)~^{Y) 



(2.11) 



where C is some constant less than 0.7655 [ij, [lai- Of course, this is equivalent to 



sup 

Y 



FNiY)-^Y) 



(2.12) 



Now let us consider the practical implications of the Berry-Esseen Theorem. What threshold value Yth ensures 
that, say, FN{Yth) < 10~^ ? Since i>(4.8916) = 10~^, a first estimate based on the Central Limit Theorem would be 
Yth ~ 4.8916. However, by Eq. (|2.12p . the error in this estimate (for ps/u^ of order one) can be of order N"-^/^. So 
the potential error in the Gaussian estimate greatly exceeds that estimate itself unless N > IQ^^l More generally, for 
large Y, N must be exponentially large - of order -for the right-hand side of ()2.12p to be smaller than ^(Y). 

When higher moments of p{x) exist, one can systematically improve on the Central Limit Theorem estimate of 
Pn{Y). These improvements are described in the next two subsections. 



C. The Edgeworth expansion 

The key ingredient in constructing the Edgeworth expansion is the cumulant generating functional of a PDF, defined 



X{uj) = ln£;[e*'^^] = In / e^^^o;) dx 



(2.13) 



One can expand the exponential and then the logarithm about a; = in (|2.13[) to obtain the following series for the 
cumulant generating functional: 



(2.14) 



q=2 



where Kq is called the q cumulant of the parent distribution. Now consider the cumulant generating functional A{uj) 
of Pn{Y), which is given by 

(. " ^ 



(2.15) 



where = ^2- Notice that A(w) depends only on N and the cumulants of p{x). Taking the exponential of both sides 
of (|2.15p . formally expanding the results around w = 0, and then gathering terms according to powers of N^^^^ yields 



OO 



Pr{iuj) 



^ J\[r/2 
r—1 



(2.16) 



5 



where Pr{iuj) is a polynomial in iuj depending only on the cumulants Kq. Since the left-hand side of (I2.16P is simply 
the Fourier transform of Pj^(Y), taking the inverse transform on both sides of (|2.16p finally gives 



Pn{Y) 



27r 



1+E 



Qr{Y) 



where the Qr(Y) are polynomials in Y . The first few terms of the series are 



(2.17) 



Pn{Y) 



where Hq{Y) is the Chebyshev-Hermite polynomial of order q, the ones appearing above being 

HaiY) = Y^-3Y 

Hi{Y) = Y^-6Y^+3 

HeiY) = - ISr'' + 45F^ - 15 . 



(2.18) 



(2.19a) 
(2.19b) 
(2.19c) 



At this point we should emphasize that while the Edgeworth series (I2.17P is formally correct, it does not converge 
in general. Rather, in the limit A'^ — > cx) it provides an asymptotic expansion of the true CDF Fj^{Y). More precisely, 
let Ar{Y) be related to the polynomials Qr{Y) defined above by 



K{Y) = 



-Qr{J)dY. 



(2.20) 



Assume that the first k cumulants Kq exist, for some fc > 3. Also assume that lim^— ►oo supm^j^ l'^(^)l < 1? where 

v{i) = [ e''''p{x)dx. (2.21) 



(This condition on v (t) will be easily satisfied for all parent distributions p{x) we consider.) Then Theorem 3 in 
section VI of Petrov [ij] states that 



N 



hm A('^-2)/2^ 



fc--2 

Fn{y) - <f{Y) -J2n-^^^My) 



= 



(2.22) 



uniformly in Y (—oo <Y< oo). 

Assuming that p{—x) = p{x) (as will be true for all examples considered in this paper), so that the odd cumulants 
of p(a;) all vanish, and assuming the first k cumulants exist (for k even), then this theorem implies that the error in 
the (fc — 2)"^-order approximation to Fn{Y) scales like 7V~'^/^ as A^ — > oo. For example, assuming K4 exists, the error 
in the second-order Edgeworth expansion of Fn{Y) scales like A^"^ for sufficiently large A^. E.g., assuming n^/a'^ is 
of order 1, one therefore generally requires A > 10* for this potential error to be smaller than 10~^®. (Again, if one 
uses ^ 10^** independent templates in the search , then one would want the false alarm probability for any one of 
them to be smaller than ^ 10^^^.) 

Now fix Y and A^. Since the Edgeworth series is only asymptotic, one will typically find that the first few terms in 
the series might get smaller and smaller, and their sum ever closer to P/v(y), but eventually the terms in the series 
may start to grow and the sum diverges. A useful rule of thumb is then to truncate the Edgeworth expansion before 
the first term that is larger than the previous ones. 

If not all moments Kq exist, it becomes clear why the Edgeworth series cannot converge, since all terms in the 
expansion decrease exponentially with Y at large F, while Fm{Y) falls off much more slowly. To see this, consider 
the case where p{x) is an even function having a power-law tail: 



p(x) Bal 



for 



for some constant B and some odd m > 0. Let f{x) be the CDF for p{x), and let f{x) = 1 
f{x) — > [B /m){x / ax)"™ at large x. Now fix the number of samples, A^. Following Bazant 



(2.23) 

f{x). Then clearly 
jl, we note that the 
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probability that the sum '^f^iiN^^'^ax) ^Xi is greater than some value Y is clearly of the same order or greater than 
the probability that any single term in the sum is greater than y, so 

Fn{Y) > Nf{N^l^a^Y) {B /m)N^-'^Y-"' (2.24) 

at large Y. So if the parent distribution has a power-law tail, then for any fixed N, Pn{Y) has the same power-law 
fall-off at very large Y. (Of course, the above argument just shows that Fn{Y) falls off no faster than Y~"^; however 
it seems likely that Fn{Y) and f{x) fall off according to the same power law at large Y and x, respectively Q.) 
This line of reasoning suggests that the Edgeworth expansion becomes unreliable at values of Y such that 

(B/m)iVi-^r-™ > r-ie-^'/2 (2.25) 

or 

y2 > (m - 2) \nN + 2(m - 1) Iny - 21n(B/m) . (2.26) 

We shall typically be interested in cases where N is large enough that the (m — 2)lniV dominates the right-hand side 
of p.26p . In that case, we obtain the rule of thumb that the Edgworth expansion (and its first term, the Central Limit 
Theorem estimate) become unreliable for Y > \J (m — 2)lniV. So the range of validity of the Edgeworth expansion 
increases only like the square root of the exponent m describing the power law fall-off of vix). 

The situation changes dramatically if the parent distribution p(a;) falls to zero expontially (or faster) as a: — *■ oo. 
In that case large-deviations theory guarantees that Pn{Z) also has exponential fall-off as Z ^ oo. We turn to this 
subject next. 



D. The theory of large deviations 

The goal of large-deviations theory is to determine the PDF of the random variable Z, defined above in Eq. (|2.5p . on 
the high-(7 tails. From the parent distribution p{x), one begins by defining a modified cumulant generating functional 
A(/3) as 

A(/3) = In / e'^''p{x)dx . (2.27) 



[Comparing with (|2.13p . we see that this is simply the usual cumulant generating functional evaluated at imaginary 
frequency lu = —i(3.] Note that this integral does not exist unless p{x) falls to zero exponentially fast as a: ^ oo. 
To emphasize this point: if p[x) has a power-law tail as x —^ oo, then X{f3) does not exist and the results of large- 
deviations theory do not apply. In the rest of this subsection, we will assume p{x) is sufficiently well behaved at large 
X that A(/3) exists. Then the laasic result of large-deviations theory is a theorem due to Cramer (e.g., see [l^l): which 
states that 

hm lln^^.(Z)^-/(z), (2.28) 

N^oo I\ 



where 



I{Z) = max[Z/3 - A(/3)] . (2.29) 



This basically implies that for large N and arbitrary Z, the PDF of the random variable Z is well approximated by 

Pn{Z) K Cexp[~N I{Z)] (2.30) 

where C is a normalization constant. Eq. (|2.29p is sometimes referred to in the literature as Chernoff's formula, and 
the function I{Z) is called the "rate function". Clearly I{Z) is the Legendre transform of A(/3). 

There are well-known, close connections between Chernoff's formula and statistical mechanics. Roughly, /3 is like 
an inverse-temperature, A(/3) is analogous to the Helmholz free energy, and —NI{Z) is analogous to the entropy. A 
gentle introduction to large-deviations theory is given in Ref. [iq| . Since we presume most of our readers are unfamiliar 
with large deviations theory, we also give a short, heuristic derivation of Eqs. (|2.29p and (|2.30p in Appendix El 
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III. CONFUSION NOISE FROM POPULATIONS OF BINARIES 

In this paper, all the GW sources we consider are types of binaries: WD binaries, MBHBs, and EMRIs. We shall 
be considering the problems of searching for one type of binary in the confusion noise produced by a large number 
of unresolved sources of a different type; e.g., considering the search for MBHBs buried in the confusion background 
of unresolved WDs or unresolved EMRIs. Since this paper represents a first-cut at the problem of estimating the 
non-Gaussian tails of the detection statistic, we shall simplify the analysis by approximating all three types of binaries 
as being in (non-precessing) quasi-circular orbits. We further approximate the emitted gravitational waveform as a 
simple chirp, with instantaneous frequency / equal to twice the orbital frequency. Also, while the waveform that LISA 
actually measures is modulated (on a 1-year timescale) by the satellite constellation's rotational and translational 
motion, for simplicity we neglect these modulations. It should be clear from the derivations, however, that including 
LISA'S orbital modulations would have very little impact on our basic results. In the next subsection we briefly 
describe our model gravitational waveforms and their overlaps. 



A. The waveforms from circular-orbit binaries and their overlaps 



Using the quadrupole formula, one can show that (to lowest order in a post-Newtonian expansion) the instantaneous 
gravitational-wave frequency / evolves in time according to 



/ 



96 



2/3 fll/3 



(3.1) 



This is easily integrated to give 



fit) = /(O) 

where the radiation reaction timescale i^r is given by 



-1 -3/8 



(3.2) 



256 7r8/3/(0)8/3^M2/3' 



(3.3) 



We assume that the gravitational wave strain hi{t) detected by LISA due to a binary (labeled by i) located at 
distance Di from the solar system assumes the form 



h,{t) = Ao^[7rA^./.(t)]'/'cos 



LP, + 2tt / f,{t')dt' 




(3.4) 



where ipi is a random initial phase and Aq is an overall factor of order one (discussed below). Again, fi{t) = 
/i(0)(l — t/t„^i)^^^^ , and henceforth we will refer to fi{0) as simply fi. The quantity A4i is the binary chirp mass 
defined as 



(3.5) 



where Mi and are the (locally measured) total and reduced masses respectively. 

Equation (|3.4p is valid as long as the binary is close enough so that cosmological effects can be neglected. This is 
certainly true for galactic white dwarf binaries. However this is generally not the case for EMRIs or MBHBs, which 
will typically be at cosmological distances. For cosmologically distant binaries, we instead have [l2|: 



DL[Zi) 

The quantity A4i{zi) is the redshiftcd chirp mass defined as 







(3.6) 



M,{z,) - (l + z,)M-/V-^'- 



(3.7) 
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The quantity Dl{z) is the standard luminosity distance at redshift z for a fiat universe, namely 



Dl{z) = 



(1 + ^) 



dz' 



(3., 



In this paper we use the values = 0.3 and f^A — 0.7. (It turns out that we do not require a precise value for 
for our analyses, since this factor just gets absorbed into a quantity representing the total number of EMRIs out to 
some maximum redshift.) 

In reality the overall factor depends on the four angles in the problem (the source's sky location and orienta- 
tion), and is in fact time- varying due to LISA's changing antenna pattern. However for this paper we neglect those 
dependencies-in effect approximating Aq by its rms value. We would not expect this approximation to greatly affect 
the overall shape of p{x). Moreover, we expect that at large x, p{x) is dominated by background sources that are 
close (small Di), rather than ones with particularly favorable orientations. Since it is primarily the tail of p{x) that 
determines the behavior of Pn{X) at large X, we do not expect this averaging over angles to greatly affect our 
conclusions. 

We write the confusion noise strain c(t) as follows 



= ^ Ai (t) cos 



ip^ + 2n I Mt')dt' 





(3.9) 



where 



A,it)=Ao^[7^M^Mt)] 



2/3 



(3.10) 



For our purpose, the quantity of interest is the overlap X = {c\h) of a given normalized template h (from a given class 
of sought-for sources) with the confusion noise c: 



{c\h)=J2ih^\h), 



(3.11) 



where the inner product {hi\h) is defined as 



(h\h)-2 r^i^nlmdf 



(3.12) 



where hi{f) and h{f) are the Fourier transforms of hi{t) and h{t), respectively, and S'„(|/|) is the one-sided noise 
spectral density. In all searches we consider in this paper, the template will also be a Newtonian chirp of the form 



h{t) = A{t) cos 
= 2 



If + 2t: f{t')dt' 



5nl 



cos 



where the normalization condition {h\h) — 1 implies 

/ = 



f-'^'df 

Snif) " 



ip + 2n I f{t')dt' 





(3.13) 



(3.14) 



Both hi{t) and h{t) are instantaneously monochromatic signals with slowly varying frequencies. Consider the tracks 
fi{t) and f{t) that their frequencies sweep out in the t~ f plane. The integral in Eq. (|3.12[) is dominated by the point 
where the two tracks cross. Using the stationary phase approximation, the integral can be approximated as [3] 



{h^\h) 



Sn[h{U)] 



A,{U)A{U)\5Uti)\-^'^cos[5^, + sgn{5h)Ti/A\, 



(3.15) 
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where 



3 f(t V^/^ 



and 



(3.16) 



S^, = ip- if, + 27r / [fit') - Mt')]dt'. 



(3.17) 



The time ti is the instant of time when the template and the zth binary cross in the time-frequency plane, i.e. 
when /(<i) = fiiti). (If the template and the i*'' binary either do not cross in the time-frequency plane, or cross 
outside LISA'S sensitivity band or outside the observation period, then we approximate their overlap by zero. In the 
applications below this is implemented by restricting the integration range over binary parameters.) 

A GW background is essentially a distribution of unresolved signals. In the next two subsections we introduce 
model distributions for the GWDB and EMRI backgrounds, respectively. 



B. Binary parameters for galactic white dwarf binaries 



Here we present our model distribution for the galactic WD binaries. For simplicity, we will assume that all WD 
binaries have the same chirp mass, for which we adopt the median value arising from recent population synthesis 
calculations: Mc — 0.25Mq [ii|. (This is approximately the Mc for a binary composed of two 0.3Mq WDs.) We 
further assume that the other binary parameters are drawn from the following distributions: 

2D 

p{D,)dD, = 0(A-i^min(/O)^(^max- A)7^2 FTWTT'^^'' ^^-^^^^ 

p(/^)d/^ - ^(/^-/min)^(/max-/.):n^ ff^) d/„ (3.18b) 



3/'min \ /min 



p{ipi)d(p, = (3.18c) 

ZTT 

where, in (|3.18bp . fi is the initial (i.e., at the beginning of the data set) gravitational- wave frequency of the binary, 
and /mill and /max represent some low- and high-frequency cut-offs for the population we are considering. For our 
applications, we shall generally take /min = lO^'^Hz and /max = 10~^Hz. The scaling p{fi) ex f^^^^^ just comes from 
the assumption that binaries are "born" at frequencies below /min and then evolve according to Eq. (j3.H) . (Basically, 
binaries evolve much faster at higher frequency, and so are correspondingly sparser there.) 

The distance probability distribution p{Di) assumes that all galactic WD binaries are uniformly distributed in a 
disk of radius -Dmax = lOkpc, centered on our Solar System. Clearly this inaccurate in two ways. First, the Solar 
System is not at the center of the Milky Way (nor is the Milky Way a uniform disk) . However the non- uniformity and 
non-centeredness (around us) of the galactic disk clearly mostly affects the distribution of binaries more distant than a 
few fcpc, and these are not among the strongest sources. The distant binaries do not strongly affect the "high-cr" tail of 
p{x), which is what is crucial for determining the tail of P/v(X). Therefore this aspect of the uniform disk assumption 
should be fairly harmless. More problematic is that disk model departs significantly from reality at distances less 
than the thickness of the disk, which is ~ 600 pc. Below this distance, it would be better to approximate the WD 
distribution as spherical. However we shall not do this for the following reason. Assuming that the distibution is planar 
clearly overestimates the number of nearby WD binaries, which artificially amplifies the high-cr tail oi p{xi) (since the 
closest background sources have the largest overlaps with any searched-for signal). At the end of our analysis, we shall 
find that, even with the uniform-disk distribution, the non-Gaussianity of the WD background is a negligible factor 
in searching for MBHBs or EMRIs. Had we correctly modified p{Di) for D < 600 pc, the non-Gaussianity would still 
be negligible-even more so. That is, while the uniform-disk assumption is hard to justify a priori, it is fully justified 
a posteriori. 

Finally, we discuss the inner cutoff -Dmin(/j)- The justification for imposing an inner cut-off Dminifi) is that within 
this distance any WD binary of frequency fi would be so bright that it could immediately be found in the data 
and essentially subtracted out, before searching for other types of signals. So when we state results, "the GWDB 
background" is really short for "the GWDB background minus the very brightest, immediately identifiable GWDB 
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i{fi)'? A straightforward calculation shows that the (sky- averaged) LISA signal- 
to-noise for a white dwarf binary at distance D is 



This is the combined SNR from LISA's A and E channels, assuming that the noise is dominated by WD confusion 
noise with (sky-averaged) spectral density SnU) = Sf~'^/^, with S — 1.44 x 10~^''Hz*^'^. Let pth the signal-to-noise 
threshold, such that GWDBs with SNR > pth are immediately subtracted from the data (or otherwise accounted for) 
before searching for (high-z, weaker) MBHBs or EMRIs. We shall take pth = 50 as a reasonable fiducial value. 

It is absolutely crucial that there be some such threshold. Since Xi cx \/ Di, if there were no threshold we would 
have p{xi) oc at large Xi, and therefore Pn{X) would fall off only as at large X. We believe the cut-off 
is physically reasonable, since there is no reason one cannot subtract off the very bright sources before looking for 
weaker ones. (Of course, at the very end of the data analysis one will want to find the joint best fit for all sources, 
which will involve re-adjusting the parameters of all the sources, including the ones that were initially "subtracted 
out".) 

Note, however, that we are imagining removing only the very strongest GWDBs in the chosen band. Now, we expect 
that LISA data analysis will actually proceed in stages, and our GWDB model basically represents this background at 
a rather early stage in the analysis. At a later stage, we expect that that it will be possible to identify and subtract out 
all GWDBs with frequencies above a few mHz. In principle we could certainly adjust the value of /max in Eq. (j3.18bp 
for different stages in the data analysis , but for simplicity in this paper we just adopt one fixed value for /max- 

Finally the distribution (|3.18cp for p(<pi) simply states that the initial orbital phase (and hence also the GW phase) 
of each binary is random and uniformly distributed. This uniform distribution in initial phase is what leads to 
p{x) = p{~x); i.e. negative values of {h\hi) are just as likely as positive ones. 



C. Distribtution of binary parameters for EMRIs 



We next turn to unresolved EMRIs as a source of confusion noise. There are three types of EMRIs, since the 
inspiraling compact object can be a WD, a neutron star (NS), or a BH. BC2 estimated the spectral density of 
confusion noise from each of these populations. The estimates are uncertain by at least an order of magnitude, due to 
the uncertainty in EMRI capture rates. EMRI confusion noise could end up being comparable to LISA's instrumental 
noise, and perhaps even comparable to GWDB confusion noise, for rates at the high end of the estimated range. 
The WDs and NSs would at first seem to represent a bigger confusion problem, since more than 90% of the GW 
signal from NS and WD EMRIs will come from unresolvable sources; i.e., the NS and WD signals mostly represent 
confusion noise. BHs are more massive and so give stronger, more readily resolved signals; perhaps only 30% of the 
GW signal from all BH EMRIs is unresolvable. Nevertheless, since mass segregation tends to concentrate the heavier 
BHs closer to the MBH, and since supernova kicks may effectively empty the inner few parsecs of NSs, our judgement 
is that, of the three source types, BHs are the most likely to lead to substantial confusion noise. For this reason, 
and for simplicity, we consider a population model composed entirely of BH EMRIs. Since the total signal from BH 
EMRIs will be dominated by events at cosmological distances, our model takes cosmological effects into account, and 
we include the effects of evolution in both the MBH mass and the event rate. We adopt the following distribution as 
our population model: 
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p{Mi)dM, = e[M, - 10^(1 + z,;)"°-^M0]6i[lO^(l + z,)-°-'^Mq - M, 

M, 
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^24 



(1 + z.)-O-6Af0 



M,; 



(3.21a) 



p{^li)d^l^ = -5M0]^?[15M0-^,]-^^, (3.21b) 

m3 fJLi 

p{zi)dz, = 0(z, -Ze)^(2-0.)A^(^c)[i?o^L(^^)]'^===^=dz,, (3.21c) 

^0.3(1 + Zi)-^ + 0.7 

p{(pi)dip, = (3.21d) 

p(/.)rf/. = ^^(/.-/min)^ fy^) ^ (3.21e) 

'J /min \ /min / 

Here Mi and /Zi are the locally-measured masses of the MBH and stellar-mass BH, respectively. The mass probability 
distributions p.21ap and (|3.21bp are derived from the following considerations. The actual distribution p{ni) is very 
poorly known, but seems centered on fj,i « IOM0, so we simply assume a flat distribution between 5 and 15M0. For 
p{Mi) we restrict attention to the MBHs that today have masses between 10^ and 1O^M0, and we assume that their 
masses have been increasing in time like i^/"^ oc (1 -I- z)~°-^, as they continuously swallow gas and compact objects. 
The locally measured distribution of compact object masses is assumed independent of time. The dependence of 
probability distributions p.21ap and (j3.21bD on M and /i respectively is obtained from assumptions on the scaling of 
merger rates with masses. Let N be the number of mergers with masses comprised between M and M + dM and fi 
and /i -|- d/i. The rate R of mergers within this box of mass parameters is 

dN ■ 

R=^f- (3.22) 
df 

Following Gair et al. [1], we take the rate R to be proportional to M^^^ . Since we are considering only a rather narrow 
range (a factor of 3) of masses for the inspiraling object (and since the distribution of stellar BH masses is poorly 
known), we approximate R as being independent of These assumptions lead to the following scaling relation 

From the definition of N we immediately obtain p{M) cx M"^/^** and p(/x) cx ^T^. Note incidentally that this also 
gives the probability distribution p.21ep , which was previously derived using the fact that the probability of finding 
a binary between frequency / and f + df is proportional to the fraction of the binary lifetime it spends around that 
frequency. 

We restrict attention to sources at redshift z < 2, partly since the rates at higher redshift are even more highly 
uncertain, and partly since the summed contribution from the z > 2 sources, all individually weak, clearly will be 
much more nearly Gaussian than the noise from the z < 2 population. _ 

The distribution (|3.21cp for p{zi) is then obtained directly from Eq.(lO) of Ref. assuming that the locally 
measured capture rate n scales as n cx (1 -I- z)°-^, i.e. the capture rate decreases over time as t~^/'^. This decrease 
reflects the fact that the MBH first swallows the closest objects, and then it has to wait longer and longer for further 
compact objects to diffuse in 8]. As with the case of galactic white dwarf binaries, we impose a short-distance cutoff 
Zc, reflecting the fact that very nearby sources can be easily identified and taken out of the confusion noise. In this 
paper we adopt the nominal value Zc = 0.1. BH EMRIs closer than Zc would typically have matched- filter SNRs in 
excess of 300 '8], and so should be very easily identified early in the data analysis. The normalization constant Af{zc) 
(defined in Eq. l3.21cp associated with this choice is A/'(Q.l) = 1.03044. 



IV. APPLICATIONS TO VARIOUS SEARCHES 



In this section we apply the statistical tools of section |TT] to the cases of matched-filter searches for MBHBs or 
EMRIs buried in confusion noise. As emphasized in the Introduction, we focus on the weakest resolvable signals 
of these types. We want to assess the importance of the non-Gaussian tails of the SNR distribution in setting the 
appropriate detection thresholds for these searches. 
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A. MBHB search: confusion from galactic white dwarf binaries 

We first consider tlie problem of searcliing for MBHB signals buried in the confusion noise from GWDBs. As a 
particular MBHB template signal chirps upwards in frequency, its track on the t-f plane intersects the tracks of all 
the white-dwarf binaries in the galaxy (up to the final merger frequency of the MBHB) . The GWDBs have random 
parameters, with PDF given by Eqs. (I3.18a[) - (|3.18cp . so each GWDB contributes some amount Xi to the detection 
statistic. 

Our first goal is to obtain the PDF p{xi)^ from which we will estimate Pn{X) using the Edgeworth expansion. Our 
parent variable Xi is an individual overlap given by 

x^ = ■^^^^^MU)Ait,)\Sf,{U)\-'/^ cos[5$, + sgn((5/0^/4] (4.1) 

An important point to notice here is that since our template is a MBHB, it is chirping much faster than any GWDB. 
Thus to a good approximation we have 

\SMti)\=f{k). (4.2) 

In fact, the white dwarf binaries are chirping so slowly that it is reasonable to assume that their frequencies are 
constant for our analysis. 

Eq. (|2.18p requires the first three non-trivial cumulants of the parent variable Xi, which are 



K2 = E[x^] = al (4.3a) 
/«3 = E[xl] (4.3b) 
K4 = E[xt] - 3cr^ (4.3c) 

Clearly, since the random phase is uniformly distributed between and 27r, all odd cumulants of Xi vanish and 
so K3 = 0. We now compute K2 and K4. For the second cumulant, we have 

„2 



K2 = J Xi p{Di)dDip{fi)dfip{ipi)dipi 

= f piD,)dD,p{m,§^I^J- (4.4) 

To go further we need a specific expression for Sn{f)- For simplicity we will approximate 5'„(/) in the band [/min, /max] 
(with /min ~ 10^* Hz and /max ^ 10^^ Hz) by the spectral density of the GWDB background (which does indeed 
work well throughout most of this band), including the contribution from GWDBs with / > 3 mHz (which should be 
resolvable at a later stage of the data analysis). EISA's instrumental noise rises steeply below and above this band, 
so we approximate l/5„(/) as vanishing outside it. Following we therefore approximate 1/S'„(/) by 

= e(/-/min)^(/max-/)^, (4.5) 



ri ( r\ \J mill / " iiAax j / ^ 

with S = 1.44 X IQ-'^-^Rz^/^. This also implies / = (/max ~ /mm)/5' (cf. Ea. [3T4l) . Combining (l3T0l) . ((3T3l) . ([3^, 
(14. 2p and (|4.5p . we obtain the following general formula for all even raw moments: 



Elxf-^=^--'^' 



(2n)!! 



/^""°''p(/.)/""^'rf/. r"^^^ piD,)Dr^-dD,. (4.6) 

"'/min "'A„i„(/,) 



Note that the lower bound in the frequency integral assumes that the initial template frequency /o lies below the 
LISA band lower bound /min- If one is interested in templates that begin inside the LISA band, then the lower bound 
on the frequency integral should be replaced by /q. 

In evaluating the distance integrals, we shall assume here that Dmax 3> i?inin(/i) for /min < fi < /max- We can 
then approximate p{Di) by 

p{D,) ~ e{D, - A„i„)^(A„ax - D,)^—D,, (4.7) 



13 



which simphfies results considerably . The computation of the remaining integrals are straightforward and the results 
are 



AIM 



10/3 



47r/S'2 

(2n- 1)!! 
(2n)!! 



16 n 



8/3 
mill 



'-^ max 




6 ^ ^" 



^m i 11 ( fm ax ) 
(/max-/mi„)(lniHz)ll(«-l)/3 



In 



2(71- 1)D 



2(n-l) 



(1 mHz) 



We then obtain the following expression for the fourth cumulant 



-^min (/mill) 

torn > 2. 




(4.8b) 



4 J ^/min (/max /min) 



64 



X /n 



In 



Aiiiii(l mHz) 
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y - /mill <! In 



^miii(/miii) 




(4.9) 



where / = //(I mHz). Since /min ~ 0.1 and = D^^^/i:>^i„(l mHz) = (10/9th/1.26)2 ~ 10^ the first term in (|49l) 
is clearly much larger than the second and so we can safely drop the —3 term. Substituting the resulting fourth 
cumulant into (|2.18p finally yields the following PDF for the signal-to-noise ratio 



PNiX) = 



"^/miii (/max /iniii)^^ 



5127V 
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X < (/max - /mill) log A + y /max(l + log /max) " /miii(l + log /min) 



H4 



(4.10) 



In the above the number N is the number of galactic white dwarf binaries with gravitational wave frequency above 
/min, our assumcd lower bound of the LISA band^. Using /min ~ 0.1, /max ~ 10, a threshold pth ~ 50, 3 years of 
observation and assuming iV = 3 x 10^ galactic binaries contributing to the confusion noise in the LISA band, the 
relative size of the 1/N correction to the Gaussian PDF predicted by the Central Limit Theorem at the 7ct is 0.02. 
Thus, at the 7 a level, the Edgeworth analysis shows that the non-Gaussian tails of the SNR distribution is negligible 
for this search. 

Finally, notice that if we take the inner cut-off distance I?min(/) as fixed, then the result (|4.10p is actually indepen- 
dent of our overall factor Aq in the waveform amplitudes, and also independent of the magnitude of S in Eq. (j4.5p 
for the noise spectral density appearing in the inner product. Multiplying either of these by an overall factor sim- 
ply re-scales all the Xi, X, and ax by the same amount, while the result (|4.10p is expressed purely in terms of the 
dimensionless ratio Xj a^- 



B. EMRI search: confusion from galactic white dwarf binaries 

Notice here the important fact that our results in IV. B arc independent of the chirp mass of the normalized 
search template. The reason for this is that while the template amplitude (for normalized templates) scales like Mc 

[cf. Ea. (|3.13[) ]. the time over which there is significant overlap with any GWDB signal scales like (/)~^/^ oc Mc 
Therefore the analysis in IV. A applies with practically no modification to searches for EMRIs buried in galactic white 
dwarf confusion noise. In the case of a realistic EMRI search in Gaussian noise, the detection threshold is around 
~ 14(7. At that level the relative correction to Pn{X) predicted by the 1/N term in the Edgeworth series is ^ 0.4. 
Since this correction is of order unity, one should also check the 1 /N'^ term. 



For a signal-to-noise threshold ptj, ~ 50 and three years of observation, the ratio ^^i^/^max is of order 10~® at 0.1 mHz, ~ 10~^ at 
ImHz and ~ 0.03 at 10 mHz. Thus the error from this approximation is negligible throughout the band 0.1 — lOmHz. 
Again, if one is interested in a template which begins inside the LISA band at f = 0, then the number A'^ appearing in ifiTTo)! is the 
number of galactic white dwarf binaries in the frequency interval (/o, /max), with /o being the initial template frequency. 
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Performing that calculation using the results of this section and the next-to-leading term of Edgeworth series (given 
by, e.g., Petrov [l3|) yields a next-to-leading correction of order ~ 0.2. Since this correction term is smaller than the 
leading term, we are inclined to trust the leading order correction to the Gaussian distribution. This confirms that 
the Gaussian approximation is still reasonably accurate at the 14 cr level, for EMRIs buried in galactic white dwarf 
confusion noise. 



C. MBHB search: confusion from EMRIs 



In this subsection we compute the signal-to-noise PDF for a matched-filter search for MBHB signals buried in 
confusion noise from unresolved EMRIs. At any given time it is expected that ^ 10**"^ unresolved EMRIs signals will 
be radiating GWs into the band 0.1 — 10 mHz, representing a significant source of confusion noise. When searching 
for MBHBs in a background of EMRIs, we may again assume that the parameter Sfi appearing in stationary phase 
overlap (|3.15[) is entirely dominated by the chirping MBHB. However, since the radiation reaction timescale for EMRIs 
is comparable to the LISA mission lifetime, our calculation must take into account that some EMRIs that are "live" 
(i.e., are pre-merger) at the beginning of EISA's observation period will "die" (merge) before their t-f track can be 
crossed by the MBHB's t-f track. 



1. Edgeworth expansion 



As before, we wish to compute the cumulants of the parent distribution p{xi) to obtain the Edgeworth expansion 
of P]\f{X). Following subsection IIV A[ we first derive an expression for the raw moments of the parent distribution, 
i.e. 



E[x' 



2n] 



^^^^^,1 J p[Zi)dz^ P[ji)dft p[Mi)dMi p(pi)d^ii 

(2n- 1)!! 
(2n)!! 



Sl-[f{U)] |/(t,)|" 



^0 



(l + z)l""/3 



p{zi)dz, / p{M,)dM,p{^ii)d^ji,p{fi)dUMr'%if f{ti) 



4n/3 2n r^. \lln/3 



(4.11) 



The step function 6{ti) sets to zero the contribution from EMRIs whose t-f tracks do not cross that of the MBHM 
template within the band [/min, /max] (since if ti < 0, the tracks must cross at some frequency below /min)- In terms 
of the following parameters 



f^ 



8/3 



fo, 



2/3 ■ 



(4.12a) 
(4.12b) 



the crossing time ti is given by 



f — — 



Ctt - 1 

1 - pr' 



(4.13) 



which then gives 



til- ai \ Pi-l 



(4.14) 



As before, we shall assume that /o < /min, i-C. the MBHB template begins outside the LISA band, but still at 
high-enough frequency that it has time to sweep through the LISA band during the mission lifetime. Then > 1, 
and combining this with the condition ti > implies that /3,; > a^. 



15 



Switching integration variables from {fi,fJ.i) at fixed Mi to {ai,Pi), we tlien obtain 



(2n) 



A2;/2^4/3jll/3 



47rJS'2 



(l + z)l""/3 



p(zi)d2:i / p{Mi)dM, 



^.^ (3i In 3 



o-2n lln/8 f Pi - 



-lln/8 



wliere Ofmin ~ (/min//o)*^'^i the exact expression given below in (|4.17p . 
Using (|4.14p . we can rewrite the step functions as follows 



mill, max J 



= w a,- — a 



iiiin,maxj ; 



dai 



*mm 9 

a: 



(4.15) 



(4.16) 



where 



*min,max 



/ mm, max 



/o 



8/3 



1 , 1 



Pi Pi \ fo 



I mm. max 



8/3' 



(4.17) 



We then get 



E[x 



2nl 



(2n- 1)!! 
(2n)U 



^2^2^^4/3^11/3 



47r/S'2 



daj ^-2n lln/8 / Pi ~ Q^i 

amin^/^, a, [j—J 



-lln/8 



dp^ 



n , \10n/3 r /-A, 



(4.18) 



Strictly speaking the upper integration limit over should be min(Q;,„axi A)- However we show below in (|4.20p 
that Pi > Ofmax, which explains justifies our limit in the previous equation. Performing the integral yields 



Now from (|4.17p . we have 



(2n- 1)!! 
(2n)!! 



^2^2^/4/3^11/3 



47r/S'2 

./3„.x 8(/3,- l)iW8 
(lln-8)/32 



In 3 



^lln/8-l(^^_^^)-nn/8+l 



f min.max 



/o 



-8/3 



(/^z f )*-^min,max; 



which, when substituted into (|4.19p . yields the following 



(4.19) 



(4.20) 



£;[a;2"] 



(2n- 1)!! 
(2n)!! 



^2/,2^,f4/3/ll/3 



47r/S'2 

-1) 



Oth 

In 3 



fl + zll0«/3 



(lln-8)/32 



/o 



(lln-8)/3 



fn 



fo 



(lln-8)/3' 



dP^ 



(4.21) 



Now since Pi ~ 10^ over the integration range (/3min, Pmax), we may set /3i — 1 ~ /3i to very good accuracy and perform 
the remaining Pi integral, which simply gives In 3, the normalization constant of the Pi probability distribution. Then 
the Mi integral trivially gives unity, as the remaining integrand, apart fromp(Mi), is independent of Mi. We are then 
left with 



£;[a;2"l =. 



(2n-l)!! Bath 


^2^2^,^4/3^11/31 




/max 


(lln-S)/3 


fmhi 


"11 


(27i)!! (lln-8) 


47r/S'2 




[ fo I 




[ fo I 





(1 + ^) 



lOn/3 



-p{zi)dzi. 
(4.22) 
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The redshift integrals must be performed numerically and we denote each value as 



(1 + ^) 



lOn/3 



-piz,)dz, = H^^ C 



For inner cutoff redshift Zmin = 0.1 and Zmax = 2, the first few values of C„ are 

Ci = 4.17924 
C2 = 29.72232 
Ca = 763.5828 

The first two terms of the Edgeworth-expanded PDF for the signal-to-noise ratio are then given by 



(4.23) 



(4.24a) 
(4.24b) 
(4.24c) 



Pn{X) = 



9 



1792 iV 



44/3 



) C2 



^8/3(1 _ ^2 



X 



0{N- 



(4.25) 



where N is the number of EMRIs that lie in the LISA band and where x = /min//max- Taking iV = 5 x 10^, the same 
frequency limits for the LISA band as before and using (|4.24p . the relative size of the 1/A^ correction to the Gaussian 
PDF predicted by the Central Limit Theorem at the la level (i.e., at Xj ^Na^ — 7) is found to be « 8. Since 
this "first-order correction" is already larger than the zeroth-order estimate, the Edgeworth expansion simply cannot 
provide a reliable answer for this problem. Instead, the problem of searching for MBHBs buried in EMRI confusion 
noise must be addressed within the context of the theory of large deviations, to which we turn next. 



2. Large- deviations analysis 

Here we compute the signal-to-noise PDF for a search or MBHBs buried in EMRI confusion noise, following the 
prescription of large-deviations theory. The starting point is the construction of the modified cumulant generating 
functional X{(3). Since we have already computed analytically all the raw moments of the parent distribution, we may 
evaluate A(/3) from its power series expansion numerically to any desired accuracy, i.e. we use 



,A(/3) _ 



00 



(2n)! 



(4.26) 



Defining /? ~ /3ax and using (|4.22p . we obtain the following expression for the cumulant generating functional 



1 °o 52n 

2 ^ 

n=2 



1 _ 2.(lln-8)/3 



lln- 8 



(4.27) 



where 



/3 = 



16x8/3(1 -x)Cl 



1/2 



(3. 



(4.28) 



We then compute the rate function describing the signal-to-noise PDF following these steps. First we compute the 
cumulant generating functional A from (I4.27P numerically. The infinite sum is truncated when the n**^ term of the 
sum is of order 10^^" of the sum of the previous n ~ 1 terms. Because the sum converges"* for any value of /3, we are 



4 This is a consequence of the fact that the parent probability distribution has compact support in our model. 
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1.1 
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2 4 6 8 10 12 14 16 18 20 

signal-to-noise normalized to unit standard deviation 

FIG. 1: This figure shows the rate function of the signal-to-noise PDF for the MBHB search in EMRI confusion noise for a 
simple Newtonian chirp toy model. The rate function is normalized to the value predicted by the Central Limit Theorem, 
namely Iclt = 12. This plot was generated using N = 5 x 10^ for the number of unresolved EMRIs in the the band 
0.1 — lOmHz. If one wants to vary the number of unresolved EMRIs, one simply rescales a;- values by ^ N/ (5 x 10^), since this 
is the multiplicative factor needed to rescale Z to unit standard deviation. 



confident that this is a reasonable accuracy criterion. We next compute A' = d\/ d(3 (here A is considered an implicit 
function of (3) by taking a derivative of (|4.27p and evaluating the sum numerically using the same truncation criterion 
as before. The next step is the maximization over fi (or equivalently /3) of the quantity / = Zp — A = Z/3 — A, where 
Z = Z/ax- The value of (3 which maximizes / is simply the one satisfying Z = A'(/3). By inverting numerically the 
function A', we obtain the function f3{Z). The rate function can then be computed numerically for any desired value 
of Z as follows 

/(Z) = Z/3(Z)-A[/3(Z)]. (4.29) 

The resulting rate function of the signal-to-noise PDF for the MBHB search in EMRI confusion noise is plot- 
ted in Fig[TJ The vertical axis is the actual rate function normalized by the Central Limit Theorem estimate: 
I{Z) /{—Q.bZ'^ / a"^]. The horizontal axis is the SNR normalized to unit standard deviation, or N^^'^Zjox- The result 
is plotted for iV = 5 X 10^ in-band EMRIs, but t o obtain the curve for any other value of A^, one simply re-scales the 
X labels on the horizontal axis by ^J~NJ(yxW). 

From that figure, one can easily see that at the 7 a level, the rate function derived from large-deviations theory 
differs significantly from the Central Limit Theorem estimate (confirming our conclusion from the Edgeworth analysis 
in IV.C.l). While the Central Limit Theorem estimate for Pn{^ = IN^I'^a^) is (27r)"i/2gxp[_49/2], the actual 
probability density is « exp[— 0.72 * 49/2], or a factor ~ 10^ larger. Therefore in deciding the appropriate detection 
threshold, one must take into account the non-Gaussianity of the signal-to-noisc PDF. In the next subsection we 
discuss the proper adjustment of the detection threshold, based on the rate function of FiglTJ 

D. Adjusting the detection threshold 

We have considered searches for MBHB signals buried in two different types of confusion background: GWDBs and 
EMRIs. For GWDBs we showed that the PDF for the SNR could be safely approximated as as Gaussian (up to the 
detection threshold) , but that a search in EMRI confusion noise alone would have to take into account the significant 
non-Gaussianity in Ppf{X) at X ~ 7a. However, EMRI confusion noise is unlikely to dominate the total noise, so 
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in this subsection we show explicitly how to calculate the appropriate detection threshold for noise that is a sum of 
EMRI confusion noise plus Gaussian noise. 

Since confusion noise from GWDBs is Gaussian to a good approximation, it can be combined with instrumental 
noise into one single source of Gaussian noise. We shall here denote the signal-to-noisc ratio obtained by correlating 
a MBHB template with this Gaussian noise as pg. This signal-to-noise is drawn from the following PDF 

P.(P.)= (^^^^""'^^"^^ (4-30) 

where (Jg is the standard deviation of the random variable pg. Next we denote the signal-to- noise ratio obtained by 
correlating a MBHB template with EMRI confusion noise as pc- This signal-to-noise is drawn from the following PDF 



Pc{Pc) = AAcCXp 



2ar 




(4.31) 



where / is the re-scaled rate function plotted in Fig[Tl N* = 5 x 10^ is the number of unresolved EMRIs chosen to 
generate FiglU A/'c is a normalization constant and is the standard deviation of the random variable Pc- Consider 
now the PDF for the total signal-to- noise ratio p = pg + pc- It is given by the following convolution integral 



/ + 00 
Pg{p - Pc)Pc{pc) dpc (4.32) 
-oo 

which can be performed numerically. If the rate function / were equal to unity, i.e. if pc were Gaussian, then p would 
also be a Gaussian random variable with standard deviation a — {a'^ + cr^)^/^. We shall determine a threshold on 

the normalized total SNR p = (cr^ -|- a1)~^/^p, assuming that if the rate function / were equal to unity, then the 
appropriate detection threshold would be set at p = 7. In order words, the acceptable false alarm probability PpA is 
assumed to be the integral of the Gaussian PDF for p over the range (— oo, —7) and (7, -l-oo): 



poo 1 

Pfa = 2 / -=e~p/^dp = erfc(7/\/2). (4.33) 
J 7 v27r 

The actual detection threshold pth for the MBHB search is then determined by the following equation 



Pfa = 



2(7 / p{ap)dp 

Pth 

oo Z'+OO 

= 2(7 I / Pg{crp~ pc)pc{pc)dpcdp 

Pth .^-oo 



+00 



erfc 



{l + e^/^Pth-ep, 



V2 



Pc{Pc)dpc 



(4.34) 



where Pc = Pc/'^c and where e = ffc/o'g measures the relative strength of the non-Gaussian component of the noise. 

In Fig. [2] we plot pth/7 as function of e for our best estimate of Cg (taken from BC2, assuming no GWDBs have 
been fitted out). That is, we fix the amplitude of the GWDB background and plot how pth varies as one increases the 
number of unresolved in-band EMRIs. This figure was generated as follows. For any e we estimated N (the number 
of unresolved EMRIs) using 



N 



1.25 X 107 



1/2 



(4.35) 



(Since an astrophysically reasonable estimate is TV = 5 x 10^, we expect e sa 0.2 in practice.) We insert A'' into 
Eq. (|4.3ip to obtain pc{pc), which we than plug into the last line of Eq. (|4.34p . We obtain the detection threshold pth 
by solving (|4.34|1 numerically. 

The most important fact one gleans from Fig. [2] is that is always very close to one. We can understand this as 
follows. For realistic values of N, the SNR from EMRIs is significantly non-Gaussian, but since the noise is dominated 
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FIG. 2: This figure shows the normahzed detection threshold pth (at fixed false alarm probability) for a total noise composed 
of a Gaussian component (instrumental noise and GWDB confusion noise) and a non-Gaussian component (EMRI confusion 
noise) as a function of the ratio e — Gcjag. In this plot we consider the Gaussian component to be fixed and e varies by adjusting 
the number of unresolved EMRIs. Note that pth is always nearly one, i.e., nearly the same as for a Gaussian distribution with 
the same standard deviation. 



by instrumental and GWDB background noise, the non-Gaussianity of the EMRI background has httle effect on the 
threshold. When iV is large enough that EMRI noise is a large fraction of the total noise, the EMRI confusion noise 
is much more Gaussian, so again the threshold is very close to the Gaussian prediction. 

Fig. [3] is the same as Fig. [5J except that for illustrative purposes we have decreased "by hand" the value of Ug by 
-v/SO. In this case, the normalized threshold pth could be (for e « 0.5) up to ~ 1.3 times higher than for Gaussian 
noise with the same standard deviation. For N = 5x 10^ and this reduced ag, we would have e ~ 1.4 and pth ~ 1-15; 
i.e., the appropriate threshold would be 8cr instead of 7 a. We note that {pthH) — * 1 both as e — > and as e ^ oo. 
This is easily understood, since as e — > the noise becomes just the Gaussian part, while as e — > cxd we also have 
N ^ oo, so the EMRI portion becomes Gaussian. 



V. SUMMARY, CONCLUSIONS, AND OPEN ISSUES 

In this paper we have analyzed the problem of determining the appropriate detection for several idealized searches. 
The most important simplifications were that we used "lowest-order" waveforms (based on the quadrupole formula, 
and assuming quasi-circular inspirals) and simplified population distributions for the confusion sources. We first 
considered searches for both MBHB signals and EMRI signals buried in confusion noise from GWDBs. Using the 
Edgeworth expansion, we showed that for these cases the PDF of the standard detection statistic remains nearly 
Gaussian out to the relevant detection thresholds. We then considered searches for MBHB signals buried in just 
EMRI confusion noise. In that case, using large-deviations theory, we found that 7 a events would occur 10^ times 
more often than suggested by the Central Limit Theorem. However this third case was rather unrealistic , since it is 
very unlikely that EMRI confusion noise will dominate the total LISA noise curve. We then considered a more realistic 
example, in which the EMRI confusion noise was combined with Gaussian noise of ~ 5 times larger amplitude. In 
that case, we again found that the non-Gaussianity of the EMRI confusion noise ends up having a negligible impact 
in setting the appropriate detection threshold. 

The rather minimal impact of non-Gaussian tails in these models appears to stem from three circumstances. First, 
the number of confusion noise sources is always rather large. Second, in all cases we imposed a short-distance cut-off 
on the distribution of the background sources, arguing that the very closest and therefore strongest of the background 
sources could be effectively removed (or otherwise taken into account) before searching for other types of sources. 
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FIG. 3: This plot is similar to Fig[2] but with the strength of the Gaussian component reduced "by hand" by a factor of 50, i.e. 
al ^ so that e = ^iV/(2.5 x 10^). In this case, the detection threshold can be up to ~ 30% higher than for a Gaussian 

distribution with the same standard deviation. 



Third, all three model problems shared the feature that the search templates and background templates evolve in 
frequency on very different timescales: fwD ^ Jemri ^ Jmbhe- Since Xi oc \Sfi\~^^^, this separation of timescales 
ensures that p{x) has no large outliers arising from coincidentally small \Sfi\~^^'^. Put another way, the dissimilarity 
of the searched-for and background signals is crucial to the sharp fall-off of p{x) at large x. The high-X tail of Pn{X) 
depends crucially on the high-x tail of p{x), and the dissimilarity of the searched-for and background signals helps 
ensure a very steep fall-off for p{x). 

We emphasize, however, that LISA data analysis will also present confusion noise problems where there is no such 
separation of timescales. For instance, consider the search for relatively nearby EMRIS signals embedded in the 
background noise from all the unresolvably distant EMRIs. In that case the parent distribution p{xi) would surely 
have a substantial tail, due to cases where Sfi is coincidentally small. Additionally, that detection problem raises 
issues of principle that we were not forced to confront in the model problems considered in this paper, and which 
we do not yet see how to resolve. For example, consider a case where some detection template A has overlap of 5, 
4, 3 and 2 with background signals A, B, C, and D, respectively. Then the total SNR is 14 (assuming the sum of 
all other overlaps can be neglected), which naively might lead one to claim a detection. Should one consider that 
claim as a false alarm? What if most (but not all) of the parameters characterizing A are fairly close to those of A? 
Presumably experience with analyzing large sets of simulated data, as in the current Mock LISA Data Challenges, 
will alert us if such issues arise very often in practice. However if such issues arise only rarely, then our experience 
with this project suggests that a sound theoretical understanding of the tails of the distribution could be crucial, since 
even with powerful computer clusters it could be difficult to sample the tails adequately with simulations. 
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APPENDIX A: HEURISTIC INTRODUCTION TO LARGE-DEVIATIONS THEORY AND 
CORRESPONDANCE WITH EDGEWORTH EXPANSION 

Here we give a heuristic derivation of Chernoff 's formula in large-deviations theory and discuss its relation to the 
Edgeworth expansion. As before, let the sample mean be 



1 ^ 



(Al) 



Now consider the modified cumulant generating functional A(/3) for the sample mean PDF Pm{Z). It is given by 



A(/3) = In / e^^PN{Z)dZ 



TV In 



p{x)dx 



(A2) 



where (3 = (3/N and where A is the modified cumulant generating functional of the parent distribution. Now e''^ is a 
rapidly increasing function of while Pn{Z) is rapidly decreasing. Therefore one expects the integrand to be sharply 
peaked, and the integral to be some constant C of order one times that the value of the integrand at that maximum. 
[Of course, this is just Laplace's method of estimating the integral (jA2p .] Define S{Z) = — \nPN{Z). Then we have 
just argued that A(/3) is well approximated by 



A(/3) = max{/3Z-5(Z)}. 



(A3) 



That is, A(/3) is the Legrendre transform of S{Z), which we can invert to obtain 

S{Z) = max{Z/3- A(/3)} 

/3 



Alternatively we may write 



where 



N max{ZP - A(/3)} . 



=max[/3Z- A(/3)]. 



(A4) 
(A5) 

(A6) 
(A7) 



and C is a normalization constant determined a posteriori. Clearly C is approximately given by C 
[iV/"(0)/(27r)]i/2]i/2. This concludes our heuristic derivation of Chernoff's formula. 
As a pedagogical example, consider the random variable X defined as 



N 



(A8) 



i=l 



where Xi is a random variable equalling +1 or —1 with equal probability. The exact probability distribution for X is 
a binomial, i.e. 



PNiX) 



iV! 



N+l 



1 

I \2 



N+l 



(A9) 
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where Y = X/vN measures how many standard deviations away from the mean the variable X lies. As N tends to 
infinity, we make use of the following refined version of Stirling's formula [1] to approximate (|A9|) as 



1 



12n 360n3 

where the 0„ are all bounded between and 1. By substituting (jAlOp into (jA9p and using Z = Y/\/N, we obtain 



(AlO) 



p^(Z) = ^^(l + ^)-^(l+^)/2-l/2(i_^)-^(l-^)/2-l/2 



X exp 



12N 



(1 + Z) {l-Z)J 360A^3 



'N 



7N{l+Z)/2 ot'Ar(l-Z)/2 



(1 + Z)3 {l-Z)3 



By further expanding in terms of Z ^ 1 and keeping the leading order corrections in we obtain 

1 x2 



Pn{X) 



PNiY) = 



TV V 4 ^ 



e w exp i - I + — - ^ j X [1 + 0{N-')] 



1 



1^ 



(All) 

(Al2a) 
(A12b) 



Let us now derive the large-deviations prediction for P]^{Z\ First the cumulant generating functional is given by 



Maximizing 1{Z^ then yields 



The rate function is therefore given by 



A(/3) = ln£;[e'^^] 
— ln(cosh/3). 



Z = — i-^ = tanh /?. 
dp 



(A13) 



(AM) 



This yields 



1{Z) = Z arctanh Z — ln[cosh(arctanh Z)] 

= i(l + Z)ln(l + Z) + i(l-Z)ln(l-Z). 



P^{Z) = C(l + Z)-^(l + ^)/2(^_^)-7V(l~Z)/2 



(A15) 



(A16) 

for some normalization constant C . Comparing this with (jAlip . we see that it matches exactly the first line of (jAlip . 
neglecting the small —1/2 term in each exponent. Large-deviations theory however does not capture the higher-order 
correction terms provided by that small —1/2 term and the entire second line of (jAlip . which one needs to obtain 
expansion (|A12a[) . How does large-deviations theory "fit in" with the Central Limit Theorem and Edgeworth 
expansion? For simplicity let us assume that "pix) is an even function (i.e., p{—x) = p{x)); clearly Pn^Z) is then 
also even. Presumably the exponent NI{Z) appearing in Chernoff's formula is simply the lowest-order term in an 
expansion in 1/A^: 



P^^Z) = [7vlI(o)/(2^)]^/'e-^(^(^)+^"-'(^)+^"^(^)+-) . 
Now expand each of I{Z), J{Z), and K{Z) as a power series in Z: 

i{z) ^ {i2Z^ + uz^ + ■■■), 

J{Z) = {jo+j2Z^+jAZ^ + ■■■), 

K{Z) = {ko + k2Z^ + kiZ^ + ■■■), 



(A17) 

(Al8a) 
(A18b) 
(Al8c) 
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where the constants jo,ko, - ■ ■ are required for properly normahzing Pn{Z) at each order in N. (There is no term 
io in the expansion of I{Z) because the prefactor [N^{0)/{2tt)]^^^ in Eq. (fXTTl) ensures that Pn{Z) is already 
normalized at lowest order. Of course, «2 = ^^^^(O).) Using = NZ^, we can then re-write Eq. (jA17|) as 



PNiY) 



g-JOg-»2i' gxp 



UY^ 



OiN- 



N 



{ko+32Y^ + i4Y^) + 0{N-^) 



(A19) 



In this form the correspondence with the Edgeworth expansion becomes clear. The normalization of Pn{Z) is fixed 
by Jo and fco to that order, and the 12 term represents the Central Limit Theorem result, with 12 = (2ct^)^^, and 
the terms are the leading-order corrections predicted by the Edgeworth series. For each term in the Edgeworth 
expansion, there is a piece that dominates at large Y . Of course, this is the term that contains the highest power of 
y, e.g., the term i/^Y'^ in Eq. (|A19[) . Large-deviations theory can be thought of as a clever way of summing up all 
these terms to determine the dominant large-K behavior. 

Let us now show how this connection works for the binomial distribution. In that case, the first two cumulants are 
easily shown to be 



K2 = 1, K4 

Thus its Edgeworth expansion is given by [cf. Eq. (|2.18p ] 



-2. 



(A20) 



Pn{Y) = 



1 



-Y'^/2 



1 - 



1 



UN 



(A21) 



If one expands the A^-dependent exponential in (IAl2bp to leading order, one obtains precisely ()A21[) . Now from (|Alip . 
we may identify the functions I{Z), J{Z) and K{Z) appearing in (|A17|) as 



I{Z) = l(i + z)log(l-i-Z)-hi(l-Z)log(l-Z), 
J{Z) = ilog(l + Z) + ilog(l-Z), 



K{Z) = 



1 - 



By expanding each of these functions around Z = 0, we obtain 12 = 1/2, 14 — 1/12, jo — 0, j2 
Substituting these values into (|A19p . we recover precisely Ea. (|A21[ ). 



(A22a) 
(A22b) 

(A22c) 

-1/2, and fco = 1/4. 
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